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Abstract The advent and availability of technology has brought us closer 
than ever through social networks. Consequently, there is a growing emphasis 
on mining social networks to extract information for knowledge and discov- 
ery. However, methods for Social Network Analysis (SNA) have not kept pace 
with the data explosion. In this review, we describe directed and undirected 
Probabilistic Graphical Models (PGMs), and describe recent applications to 
social networks. Modern SNA is flooded with challenges that arise from the 
inherent size, scope, and heterogeneity of both the data and underlying pop- 
ulation. As a flexible modeling paradigm, PGMs can be adapted to address 
some SNA challenges. Such challenges are common themes in Big Data appli- 
cations, but must be carefully considered for reliable inference and modeling. 
For this reason, we begin with a thorough description of data collection and 
sampling methods, which are often necessary in social networks, and underlie 
any downstream modeling efforts. PGMs in SNA have been used to tackle 
current and relevant challenges, including the estimation and quantification 
of importance, propagation of influence, trust (and distrust), link and profile 
prediction, privacy protection, and news spread through micro-blogging. We 
highlight these applications, and others, to showcase the flexibility and pre- 
dictive capabilities of PGMs in SNA. Finally, we conclude with a discussion 
of challenges and opportunities for PGMs in social networks. 


Keywords Probabilistic Graphical Modeling - Social Network Analysis - 
Bayesian Networks - Markov Networks - Exponential Random Graph Models - 
Markov Logic Networks - Social Influence - Network Sampling 
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1 Introduction 


Over forty years ago, social scientist Allen Barton stated that “If our aim is 
to understand people’s behavior rather than simply to record it, we want to 
know about primary groups, neighborhoods, organizations, social circles, and 
communities; about interaction, communication, role expectations, and social 
control.” (Barton, 1968 as reported in Freeman, 2004). This sentiment is fun- 
damental to the concept of modularity. The importance of structural relation- 
ships in defining communities and predicting future behaviors has long been 
recognized, and is not restricted to the social sciences [48]. 


Social Network Analysis (SNA) has a rich history that is based on the 
defining principle that links between actors are informative. The advent and 
availability of Internet technology has created an explosion in online social net- 
works and a transformation in SNA. The analysis of today’s social networks is 
a difficult Big Data problem, which requires the integration of statistics and 
computer science to leverage networks for knowledge mining and discovery [99]. 
SNA scientists have had to rely on tractable records of social interactions and 
experiments (e.g., Milgram’s small world experiment); now they have a lux- 
ury of accessing huge digital databases of relational social data. However, this 
gain in information comes at a price; many of the statistical tools for analyz- 
ing such databases break due to the enormity of social networks and complex 
interdependencies within the data. False discovery rates are not easily con- 
trolled, which makes the identification of meaningful signals and relationships 
difficult [42]. Moreover, sampling networks is typically required, which can 
propagate selection bias through and downstream inference procedures. 


SNA relies on diverse data representations and relational information, 
which may include (among others), tracked relationships among actors, events, 
and other covariate information [130]. Modeling social networks is especially 
challenging due to the heterogeneity of the populations represented, and the 
broad spectrum of information represented in the data itself. In this review, we 
focus on Probabilistic Graphical Models (PGMs), a flexible modeling paradigm, 
which has been shown to be an effective approach to modeling social net- 
works [81,91]. Modern applications, including the estimation of influence, pri- 
vacy protection, trust (and distrust) microblogging, and web-browsing, are 
presented to highlight the flexibility and utility of PGMs in addressing cur- 
rent and relevant problems in modern SNA. 


PGMs provide a compact representation of a high-dimensional joint prob- 
ability distribution of variables, by utilizing conditional independencies in the 
network of these variables; such a network, with local (in)dependency specifi- 
cations, is called a model. PGM modeling is rooted in probabilistic reasoning, 
querying and also can also be used for generative purposes (sampling) [81]. In 
this review, we outline the basic theory, and model parameter and structural 
learning, but emphasize practical application and implementation of these 
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models to solve modern problems in SNA. We describe some of the unique 
statistical challenges that arise in using PGMs in SNA. The challenges are not 
isolated to PGMs. Rather, they propagate from the very foundation of the 
model - the data, through the local statistical models of the links and nodes, 
and finally to the graphical model. This review is organized from the bottom- 
up: from data sampling, to directed and undirected graphical models. 


This paper is structured as follows. Section 2 provides an overview on 
data collection methods for SNA, reviews the challenges that arise in network 
sampling, and cites some network data repositories. In Section 3, directed 
probabilistic graphical models, static and dynamic, are discussed accompanied 
by application examples in SNA. Section 4 turns to undirected graphical model 
types and their applications. Section 5 concludes the paper and outlines future 
directions and challenges for PGM-based research in SNA. 


2 Data collection and sampling 


Data collection from social networks is a fundamental challenge that inherently 
affects downstream analysis through sampling bias [11,19]. The reproducibil- 
ity and generalization of any statistical analysis performed depends critically 
on the sample population, and how representative they are of the true popula- 
tion. In traditional observational and clinical studies, randomization and large 
sample size are important aspects of experimental design [28]. The object of 
a study may be driven by attributes such as the presence of a disease, or a 
covariate such as profession, age, preferences, etc. In contrast, SNA focuses 
primarily on the relations among actors, not the actors themselves and their 
individual attributes. For this reason, the population is not usually comprised 
of actors sampled independently; rather, the sampling scheme is driven by ties 
among the actors. 


Snowball sampling begins with an actor, or a set of actors, and moves 
through the network by sampling ties [13]. Snowball methods are useful for 
identifying modules within a population, e.g., leaders, sub-cultures, and com- 
munities. The inability to include isolated actors that are directly tied in, but 
may be informative to the analysis, is a major limitation. Other disadvantages 
include the overestimation of connectivity, and the sensitivity of the sample to 
the initialization setting(s) of the snowball(s). Improvements on snowball sam- 
pling have been proposed to address some of these limitations [8, 44, 66, 133]. 


An alternative approach is to target actors in an ego-centric manner. There 
are two main sampling designs, with and without alter peer connections [63]. 
In this setting, a set of focal actors is selected, and their first-level ties are 
identified. In ego-centric networks with alter connections, those first-level ties 
are examined to determine connections between them. Ego-centric network 
without alter connections simply rely on focal actors and first-level ties; with 
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this approach, the extrapolation and generalization to the whole network is 
not possible. 


Online Social Networks (OSNs) present unique challenges due to their mas- 
sive size and the nature of the heterogenous attributes. A number of factors 
complicate the data collection process. Individuals can customize personal 
privacy setting, limiting crawlers from obtaining information and ultimately 
creating a missing data problem for the analyst. The diversity and dynamic 
nature of the data itself makes pages difficult to parse for collection purposes. 
Furthermore, the sampling is critical for tractable inference and analyses of 
large-scale OSNs. In most OSNs, we are faced with hidden populations, i.e., 
with unknown population size or the underlying distributions of the variables 
(edges or actors). In these cases, access to the network is facilitated through 
neighbors only. Crawling (through neighbors), either by random walks or graph 
transversals, is one of the most widely-used network exploration technique for 
OSNs. 


— Random Walk: Metropolis-Hastings algorithm is a widely-used Markov 
Chain Monte Carlo (MCMC) method for sampling social networks [26]. 
The random walk starts at a random (or targeted) node and proceeds iter- 
atively, moving between nodes 7 to j according to a transition probability. 
As n —+ oo, the sampling distribution approaches the stationary distribu- 
tion of actor characteristics, as if each sampled individual was uniformly 
drawn from the underlying population. In practice, the heuristic diagnos- 
tics are performed to assess convergence; the success of the methods can 
also depend on the starting point of the chain. Even with multiple chains, 
mixing can be slow and the chain can get stuck in regions of the graph. 
Note that these features are common to applications of MCMC methods, 
and not restricted to OSNs [52]. 


— Graph transversals: Several graph transversal methods have been ap- 
plied to OSNs. These techniques differ only slightly in the order in which 
they systematically visit nodes in the network. Breadth-first search (BFS) 
and snowball sampling visit the graph through neighbor nodes [57]. Depth- 
first search (DFS) explores the graph from the seed node through the chil- 
dren nodes, and backtracks at dead-ends. 


Factors such as sample size, as well as seed and algorithm choice can in- 
troduce bias into the statistical analysis of a network. Several authors have 
performed detailed investigations of the efficiency and bias associated with 
sampling algorithms using different OSNs [18,92]. Breadth First Search (BFS) 
is the most widely used method for OSN sampling and has been shown to be 
biased toward high-degree nodes [87,160]. Variants of the M-H algorithm have 
been proposed: Metropolized Random Walk with Backtracking (MRWB), M-H 
Random Walk (MHRW), Re-Weighted Random Walk (RWRW) and Unbiased 
Sampling to Reduce Self Loops (USRS), which aim to reduce or correct sample 
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bias [54, 125, 139, 152]. 


Publicly Available Data: 

Several data resources have been created to house a wealth of diverse so- 
cial network data. These resources are usually open source, requiring, at a 
minimum, a user agreement. Leveraging these resources is ideal for the devel- 
opment and testing of methodologies related to SNA. Max-Plank researchers 
have released OSN data used in publications, which includes crawled data from 
Flickr, YouTube, Wikipedia and Facebook [20, 21, 101, 149]. Several directed 
OSNs have been released in the Stanford network analysis package (snap), e.g. 
from Epinions, Amazon, LiveJournal, Slashdot and Wikipedia voting [138]. Re- 
cently, a Facebook dataset collected with MHRW was released, which exhibited 
convergence properties and was shown to be representative of the underlying 
population [54]. However, MHRW and UNI data sets contain only link infor- 
mation, thereby prohibiting attribute based analyses. 





Document classification datasets have also been released [53]. A sample 
from the CiteSeer database contains 3, 312 publications from one of six classes, 
and 4, 732 links. The Cora dataset consists of 2,708 publications classified into 
seven categories and the citation network has 5,429 links. Each publication 
is described by a binary word vector which indicates the presence of certain 
words within a collection of 1,433. WebKB consists of 877 scientific publica- 
tions from five classes, contains 1,601 links and includes binary word attributes 
similar to Cora. Terrorism databases are also publicly available [38, 141, 142]. 
The most extensive is the RAND Database of Worldwide Terrorism Incidents, 
which details terrorist attacks in nine distinct regions of the world across the 
time-span 1968 — 2009 (dates vary slightly depending on region) [38]. Several 
well-known challenges may arise in the analysis and representation of terrorist 
network data, including incomplete information, latent variables influencing 
node dynamics, and fuzzy boundaries between terrorists, supporters of terror- 
ists, and the innocent [85, 136]. 


An alternative option to access data is to enroll in data challenges, which 
are often posed by corporations and operators of the networks themselves. 
For example, the Nokia mobile data challenge data was released in 2012 [90]. 
The data follows 200 users throughout the course of a year, and includes: usage 
(full call and message log), status (GPS readings, operation mode, environment 
(accelerometer samples, wi-fi access points, bluetooth devices), personal (full 
contact list, calendar), and user profile. Formal requests are required to use of 
this data, ensuring use for search and development, and prohibiting commercial 
use. Twitter has just posed TREC 2013, a collection of 240 million tweets 
(statutes) collected over a two month period [100]. This is the third year of 
TREC releases. The use of this data requires registration for a competition 
that centers around a competition. The 2013 competition centers around real- 
time ad-hoc search tasks. 
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3 Directed Probabilistic Graph Models 


Bayesian Networks (BNs) are a special class of PGMs that capture directed 
dependencies between variables, which may represent cause-and-effect rela- 
tionships. We describe two different branches of BNs, static and dynamic, 
which may be used to model social networks at a single time point or across a 
series of time point respectively. Both rely on the Markov assumptions, which 
enables the compact representation of the high-dimensional joint probability 
distribution of the variables in the model. Arguably, the use of directed graphs 
in SNA has been somewhat limited, although the applications themselves are 
diverse. We describe the basic principles of these directed PGMs and motivate 
them with applications in the literature, which showcase their utility in SNA. 


Static Bayesian Networks utilize data from a single snapshot of a so- 
cial community at a given time-point, described by Directed Acyclic Graph 
(DAGs). A DAG conveys precise information regarding the conditional in- 
dependencies between modeled variables (nodes). The resulting graph, G, 
can be translated directly into a factored representation of the joint distribu- 
tion [67,91]. BNs obey the Markov condition which states that each variable, 
Xj, is independent of its non-descendants (unconnected nodes), given its par- 
ents in G. Under these assumptions, a BN for a set of variables {X1, Xo,...Xn} 
is a network with the structure that encodes conditional independence rela- 
tionships: 


P(X1,Xo,...,Xn) = P(G) |] P(X | pa(Xi), 6), 


i=1 


where P(G) is the prior distribution over the graph G, pa(X;) are the parent 
nodes of child X;, and O; denotes the parameters of the local probability dis- 
tribution. 


Depending on the data and modeling objectives, BN learning may require 
up to two layers of inference: structural and parameter learning. Identifying 
the DAG that best explains the data is an NP-hard problem [27]. Structural 
inference can be conducted by sampling the posterior distribution to obtain 
an ensemble of feasible graphs, or through the implementation of a greedy hill- 
climbing algorithm, to identify a single graph structure that best approximates 
the Maximum a Posteriori (MAP) probabilities [68]. In many applications of 
SNA, the structure is often assumed, at least to some degree. In this case, the 
statistical inference problem is local parameter inference conditional on the 
assumed structure of the network. 


The directionality and causal structure of the inferred model makes BN an 
attractive modeling paradigm for social networks that captures and conveys 
cause and effect relationships in a problem setting. Such examples, may mani- 
fest in decision making (influence). Screen-Based Bayes Net Structure (SBNS) 


8 Alireza Farasat et al. 





was developed as a search strategy for large-scale data, which relies on the 
adopted assumption of sparsity in the overall network structure [55]. Sparsity 
in BN is a popular assumption that can safeguard against over-fitting [68]. 
SPSN enforces the sparsity through a two stage process, which frames the 
structural learning problem as Market Basket Analysis task [12]. The algo- 
rithm relies on the theory of frequent sets and support, to first screen for local 
modules of nodes, and then connect them through a global structure search. 
The Market Basket framework lends itself to transaction style data, which 
is by nature large, sparse and binary. In this case, actors are assumed to be 
linked to each other indirectly through items or events (Figure 1A). The learn- 
ing problem is to identify an influence graph based on derived features of the 
binary transaction data. The method was shown to be effective for modeling 
a variety of SNs, including citation networks, collaboration data, and movie 
appearance records [12]. 


Koelle et al. proposed applications of BNs to SNA for the prediction of 
novel links and pre-specified node features (e.g., leadership potential) [80]. 
The authors emphasize the advantage of BN to account for uncertainty, noise, 
and incompleteness in the network. For example, a topology-based network 
measures such as degree centrality, which is often used as a surrogate for impor- 
tance, is subject to summarizations over incomplete and sometimes erroneous 
data. Comparatively, a BN affords more flexibility that enables measures such 
as importance to be estimated in a more data-dependent manner. Koelle et al. 
provide an example of combining topology-based network measures with co- 
variate information (Figure 1B). Directed inference of this type leverages small 
local models, which can be naturally translated to regression or classification 
problems, depending on the child node (response variable). In this setting, the 
local BN can be evaluated at the node-level, ranked probability estimates can 
be used for predictive purposes, and the output serves as a surrogate for model 
fit on a given structure. 


Privacy protection is a major concern amongst users in online social net- 
works [65]. Generally, people prefer that their personal information is shared 
in small circles of friends and family, and shielded from strangers [24]. Despite 
this common desire, relatively simple BNs have been shown to be successful in 
the invasion of privacy though the inference of personal attributes, which have 
been shielded through privacy settings [65]. The BNs operate under the often 
accurate assumption that friends in social circles are likely to share common 
attributes. In 2006, the recommendation by He et al. to improve privacy was 
to hide friend lists through privacy settings, and to request that friends hide 
their personal attributes. Practically speaking, setting the optimal privacy set- 
tings is complex, and can be a tedious and difficult for an average user [96]. 
In 2010, a privacy wizard template was proposed, which automates a persons 
privacy settings based on an implicit set of rules derived using Naive Bayes 
(the simplest BN) or Decision Tree methods [43]. 


Probabilistic Graphical Models in Modern Social Network Analysis 9 





On the other side of the application spectrum, BNs are useful for recom- 
mending products and services, to users, taking into account their interests, 
needs and communications patterns. Belief propagation has been used to sum- 
marize belief about a product and propagate that belief through a BN [9,159]. 
Belief propagation is the process in which node marginal distributions (beliefs) 
are updated in light of new evidence [82]. In the case of a BN, evidence (e.g., 
opinion or ratings) is absorbed and propagated through a computational object 
known as a junction tree, resulting in updated marginal distributions. Compar- 
ing the network marginals before and after evidence is entered and propagated 
conveys a system-wide effect of influence(s), and insights into how perception 
or ratings change when recommendations are passed through a network. De- 
spite its simplicity, the BN approach has been shown to be competitive with 
the more classical Collaborative Filterting (CF)-based recommendation [158]. 
Trust (and distrust) can be highly variable dynamic processes, which depends 
not only on distance from a recommender, but also, the characteristics of the 
network users [88, 153]. Accounting for trust in recommendation systems is an 
open area of research 


Microblogging networks represent another effective venue for rapidly dis- 
seminating information and influence throughout a community. Twitter is the 
most well-known microblogging network, in which posts (tweets) are short and 
time-sensitive with respect to the reference of current topics [89]. Users within 
microblogging networks of this type participate though the act of following and 
being followed, which gives rise naturally to directed associations [75]. With 
over 50 million tweets submittted daily, ranking and querying micrblogs has 
become an important and active area of open research [25, 97, 105, 110, 114]. 
Jabeur et al. proposed a retrieval model for tweet searches, which takes into 
account a number of factors, including hashtags, influence of the microblog- 
gers, and the time [72,73]. A query relevance function was developed based on 
a BN that leverages the PageRank algorithm to estimate parameters, such as 
influence, in the model (Figure 1C). The retrieval model was shown to outper- 
form traditional methods for information retrieval on Twitter data from the 
TREC Tweets 2011 corpus [111]. 
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Bayesian Network Applications 


A) Sparse Bayesian Influence 
Individuals linked through events Inferred Social Influence 


B) Local Bayesian Network Prediction 
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C) Bayesian models of Twitter Queries 
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Fig. 1 Simplified schematics of select examples of Bayesian Networks in social networks. 
(A) Inferring sparse Bayesian influence based on transaction style data, which links actors 
to events. (B) Local models can be used to assess predict local metrics, such as individual 
importance or leadership potential, from attributes and centrality measures on the network 
itself. (C) Twitter is a microblogging community, which can be queried using a retrieval 
model described by a Bayesian Network. 


Thus far, the BNs discussed summarize information at a single time-point. 
This represents an oversimplification of the true nature of the networks de- 
scribed, which are inherently dynamic [137]. In the described SN applications, 
the dynamic aspects are simplified by extracting data from a snapshot (or 
series of snapshots) of the SN across a time-period. The discretization, e.g., 
coarse or fine, can bias the results of the analysis. Discretization can give rise 
to many of the issues related to data collection discussed in Section 2. Mod- 
eling the dynamics of a network over the time-course can be achieved in the 
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BN framework with additional modeling assumptions. 


Dynamic Bayesian Networks Dynamic Bayesian Networks (DBNs) pro- 
vide compact representations for encoding structured probability distributions 
over arbitrarily long time courses [103]. State-space models, such as Hidden 
Markov Model (HMM) and Kalman Filter Models (KFMs), can be viewed as 
a special class of the more general DBN. Specifically, KFMs require unimodal 
linear Gaussian assumptions on the state-space variables. HMMs do not allow 
for factorizations within the state-space, but can be extended to hierarchical 
HMMs for this purpose. DBNs enable a more general representation of sequen- 
tial or time-course data. 


DBN modeling is achieved through the use of template models, which are 
instantiated, i.e., duplicated, over multiple time points. The relationships be- 
tween the variables within a template are fixed, and represent the inherent de- 
pendencies between ground variables in the model. The objective is to model 
a template variable over a discretized time course, X°...X7, and represent 
P(X° : X7) as a function of the templates over the range of time points. 
Reducing the temporal problem to conditional template models, makes the 
problem computationally tractable, but requires the specification of a fixed 
structure across the entire time trajectory. 


In a DBN, the probability for a random variable X spanning the time 
course can be given in factored form, 


POO) SE) Tews | xe, 


where X° represents the initial state, and the conditional probability terms 
of the form P (X‘+! | X‘) convey the conditional independence assumptions. 
The conditional representation of the likelihood is similar in spirit to the static 
BN representation, but conveys the conditional independence with respect 
to time. The Markov assumption enables this factorization, which has dif- 
ferent, yet analogous meanings in static and dynamic BNs. In a DBN, the 
Markov assumption explains the memorylessness property, i.e., that the cur- 
rent state depends on the previous and is conditionally independent of the 
past (X'+11.X°*"1 | X*‘), Comparatively, in static BNs, the Markov assump- 
tion only captures nodes’ independence of their non-descendants, given the 
states of their parents. 


Both DBNs and static BNs represent joint distributions of random vari- 
ables. Similar to static BNs, DBNs also may require up to two layers of in- 
ference, structural and parameter learning. The learning paradigms are rather 
similar. Structural learning is typically achieved by the same scoring strategies, 
but with the added constraint that the structure must repeat over time [49]. 
Such a constraint alleviates the computational burden for search strategies. 
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Additionally, the best initial structure can be searched for independently from 
the remainder of the time-course. The search is performed either through 
greedy hill climbing or sampling. Several options exist for parameter learn- 
ing, including junction trees, belief prorogation, and EM algorithm [33,78, 132]. 


Despite the fact that social networks are typically inherently dynamic, the 
applications of DBNs in SNA have been limited. Importantly, there have been 
many attempts to model social networks probabilistically over time, but not 
in the strict PGM context, which is the focus of this review; many of these 
advances are discussed in Section 5. Chapelle et al. used DBNs to model web 
users’ browsing history [22]. The DBN extends the traditional and widely- 
used cascade model for browsing behavior to a more general model [77]. The 
dynamic studied here is that of click sequences, which is illustrated in Fig- 
ure 3 for a single click (one time instance). The model takes into account the 
information at the query and session levels, differentiating perceived/ actual 
attraction (a, and A; respectively) and perceived/ actual satisfaction (s,, and 
S; respectively) with links. At each click (time-step), the hidden binary vari- 
ables for examination (£;) and satisfaction (S;) track the time progression to 
predict future clicks. The DBM approach was shown to outperform traditional 
methods, and highlighted the sensitivity of click modeling to measures of rel- 
evance and popularity at the query level. 


DBNs and HMMs are very popular in the area of speech recognition [115, 
162]. Meetings are social events, in which valuable information is exchanged 
mainly through speech. Effectively processing, capturing, and organizing this 
information can be costly, but is critical in order to maximize the impact and 
information flow for participants. Dielman e¢ al. cast the problem of meet- 
ing structuring as a DBN, which partitions meetings into sequences of ac- 
tions or phases based on audio [35]. Data including speaker order, location 
detected from microphone array, talk rate, pitch, and overall energy (enthu- 
siasm). DBNs outperformed baseline HMMs in detecting meeting actions in 
a smart room, such as dialogue, notes at the board, computer presentations, 
and presentations at the board. 


Twitter, and microblogs in general, have become a major resource for the 
media to obtain breaking news or a the occurrence of a critical event. Recently, 
Sakaki et al. modeled Twitter activity using KFMs in an effort to identify 
event and event location [124]. Each Twitter user is assumed to represent 
a sensor that monitors tweet features such as keywords, locations of tweets, 
their length and content. Support Vector Machines (SVM) are first used for 
event classification, followed by a Kalman filter to identify the location and the 
path itself. Location information of the quake is estimated through parameter 
learning at each time-point. Through tweet modeling, the authors were able 
to predict 96% of Japan’s earthquakes of a certain magnitude. Furthermore, 
they developed a reporting system Torreter, which is quicker than the existing 
government reporting system in warning registered individuals through email 
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of an impending quake [74]. This important and highly cited work can be 
generalized in this paradigm to model and predict other events. 





Fig. 2 An example of a time instance in a DBN used for click modeling in a browser. 
The temporal dimension is click sequence, which can be progressed through binary latent 
variables depicting satisfaction (S;) and examination (£;). Attraction (A;) and satisfaction 
(S;) are modeled at the session level, as well as the query level (ay, and sx), which is assumed 
to be time invariant. 


4 Undirected Probabilistic Graph Models 


Markov Networks (MNs), also known as Markov Random Fields (MRF), are 
PGMs with undirected edges. Similar to directed BNs, a MN graph is a rep- 
resentation of the joint distribution between variables (nodes), where the ab- 
sence of an edge between two nodes implies conditional independence between 
the nodes, given the other nodes in the network. In this review, we restrict 
our focus to MNs, Markov Logic Networks (MLNs) and Exponential Random 
Graph Models (ERGMs), which can be viewed as generalizations of the ran- 
dom graphs [47], and are widely used in SNA [109]. The basic formulation of 
these models and their utility in SNA will be highlighted. 


Markov Networks can be decomposed into smaller complete sub-graphs 
known as cliques. A clique is a maximal clique if it cannot be extended to 
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included addition adjacent nodes. Clique representation enables a compact 
factorization of the probability density function (pdf). Specifically, the pdf 
captured by a graph G can be represented in the form: 


P(X) = 5 J] ve(Xo), (1) 


CEN 


where C is a maximal clique in the set of maximal cliques 2, and Wc(ac) is 
the clique potential. The clique potentials are positive functions that capture 
the variable dependence within the cliques [82]. The normalizing constant, also 
known as the partition function, is given as: 


Z=S° [[ ¢c(Xce). 


XExX CEN 


Each clique potential in a MN is specified by a factor, which can be viewed 
as a table of weights for each combination of values of variables in the po- 
tential. In some special cases of MNs such as log-linear models [104], clique 
potentials are represented by a set of functions, termed features, with associ- 
ated weights (i.e., d¢(Xc) = log(wo(Xc)), where dc (Xc) is a feature derived 
from the values of the variables in set Xc). 


The Hammersley-Clifford theorem specifies the conditions under which a 
positive probability distribution can be represented as a MN. Specifically, the 
given representation (Equation 1) implies conditional independencies between 
the maximal cliques and is, by definition, a Gibbs measure [61]. 


MN specification problems, including parameters estimation and struc- 
ture learning from data, can be quite challenging. The main difficulty in MN 
parameter estimation is that the maximum likelihood problem formulated 
with Equation 1 has no analytical solution due to the complex expression 
of Z [93]. The problem of finding the optimal structure of G [76] using avail- 
able data, similar to BNs, is even more challenging [16]. Currently existing 
approaches to structure learning are either constraint-based or score-based 
(see [37,81, 106, 123, 129,161] for more details). 


MNs found use in SNA with the emergence of online social networks (OSNs) 
and digital social media (see [14] for a review of key problems in SNA). 
The need to capture non-causal dependencies within and between data in- 
stances (e.g., profile information) and observed relationships (e.g., hyperlinks) 
in these applications is exacerbated by the presence of missing or hidden data 
in OSNs [156]. A popular problem instance in this domain, that of user (miss- 
ing) profile prediction, has been attacked using MNs [107, 117, 140]. 


Along with the problem of predicting missing profiles, link prediction is 
among the most prominent problems in Big Data SNA. Multiple variations 
of MNs that have been used to estimate the probability that a (unobserved) 
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link exists between nodes include Markov Logic Networks, Relational Markov 
Networks, Relational Bayesian Networks and Relational Dependency Net- 
works [5, 23, 143, 145]. 


Detection of community substructures is another area of MN applica- 
tion [41,108]. Social network clustering is especially challenging in a dynamic 
context, e.g. in Mobile Social Networks [70]. Wan et al. employed undirected 
graphical models (i.e., conditional Random Fields) constructed from mobile 
user logs that include both communication records and user movement infor- 
mation [151]. Communities can be discovered through examination and subset- 
ting (cutting) network relationships according to labels of interest, and through 
the use of weighted community detection algorithms. Relational Markov Net- 
works can be used for labeling relationships in a social network with given 
content and link structure [150]. 


Several generative models have been proposed, which are motivated by 
MNs, and explain the effects of selection and influence (e.g., see [2]). Modeling 
channeled spread of opinions and rumors, known more generally as diffusion 
modeling, is an active area of research in SNA [10, 94,119]. Several applica- 
tions of diffusion models have been proposed for social networks including, but 
not limited to the spread of information [30], viral marketing [77], spread of 
diseases [7], the spread of cooperation [127]. Given a social network, for each 
node, a corresponding random variable indicates the state of the node (e.g., 
product or technology adoption) and links in the network represent depen- 
dency [155]. 


Markov Logic Networks employ a probabilistic framework that inte- 
grates MNs with first-order logic such that the MN weights are positive for 
only a small subset of meaningful features viewed as templates [117]. Formally, 
let F; denote a first-order logic formula, i.e., a logical expression comprising 
constants, variables, functions and predicates, and w; € R denote a scalar 
weight. An MLN is then defined as a set of pairs (Fj, w;). From the MLN, 
the ground Markov network, Mz,c, is constructed [117] with the probability 
distribution [145], 


P(X =2)= Zo (x: nt) (2) 


where n;(a) is the number of true groundings (e.g., logic expressions) of Fj, 
i.e., such formulae that hold, in x. Figure 3 gives an example of a ground MLN 
represented as a pairwise MN (left) for two individuals [104]. 


Many problems in statistical relational learning, such as link prediction [39], 
social network modeling, collective classification, link-based clustering and ob- 
ject identification, can be formulated using instances of MLN [117]. Dierkes 
et al. used MLNs to investigate the influence of Mobile Social Networks on 
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Fig. 3 An example of MLN with two entities (individuals) A and B, the unary relations 
“smokes” and “cancer” and the binary relation “friend”. The ground predicates are denoted 
by eight elliptical nodes. Two formulas, F (“someone who smokes has cancer”) and F2 
(“friends either both smoke or both do not smoke” ) are captured. There exist two groundings 
of the F, (illustrated by the edges between the “smokes” and “cancer” nodes) and four 
groundings of Fy captured by the rest of the edges [145]. 


consumer decision-making behavior. With the call detail records represented 
by a weighted graph, MLNs were employed in conjunction with logit models as 
the learning technique based on lagged neighborhood variables. The resulting 
MLNs were used as predictive models for the analysis of the impact of word of 
mouth on churn (the decision to abandon a communication service provider) 
and purchase decisions [36]. 


As mentioned above, link mining and link prediction problems can also be 
addressed using MLNs, since MLNs combine logic and probability reasoning 
in a single framework [40,131]. Furthermore, the ability of MLNs to represent 
complex rules by exploiting relational information makes them an appropriate 
alternative for collective classification (e.g., classification of publications in a 
citation network, or of hyperlinked webpages) [31,34]. 


The Ising model and its variations form a subclass of MN with founda- 
tions in theoretical physics [6]. The Ising model is a discrete and pairwise MN, 
and is popular in applications in part due to its simplicity [82]. The variables 
in the model, X,...Xp, are assumed to be binary, and their joint probability 
is given as: 


p(X,0)=exp{ JS > 0j.X;X,-G(0)| VX ex, 
(j,k)EE 


where y € {0,1}”, and &(O) is the log of the partition function 


&(O) = log So exp Se OjnXjLE 
DEX (j,k)EE 
Special, efficient methods exist for learning the Ising Model parameters 
from data [116]. While the model has been originally found useful for under- 
standing magnetism and phase transitions, its utility has later expanded to 
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image processing, neural modeling, and studies of tipping points in economics 
and social domains [1]. 


In SNA, the Ising model can be employed to analyze factors such as net- 
work sub-structures and nodal features affecting the opinion formation process. 
A classical example within this are is a study of medical innovation spread, 
namely the adoption of drug tetracycline by 125 physicians in four small cities 
in Illinois [17]. Figure 4 depicts the physicians’ advisory network from a data 
set prepared by Ron Burt from the 1966 data collected by Coleman, Katz 
and Menzel [29] about the spread of medical innovation. The figure illustrates 
the physicians’ network in two different time points and shows how physicians 
changed their opinions and adopted the new medication overtime. 


@ Adopted 


@ Not adopted 





Fig. 4 The spread of new drug adoption through an advisory network physicians: two 
snapshots at different time points, about two years apart (from left to right). The growth 
dynamics in the number of adopters can be analyzed with an Ising Model. 


Recently, the Ising Model has been used to examine social behaviors [148], 
including collective decision making, opinion formation and adoption of new 
technologies or products [50, 60,84]. For example, Fellows et al. proposed a 
random model of the full network by modeling nodal attributes as random 
variates. They utilized the new model formulation to analyze a peer social net- 
work from the National Longitudinal Study of Adolescent Health [45]. Agliari 
et al. proposed a model to extract the underlying dynamics of social systems 
based on diffusive effects and people strategic choices to convince others [3]. 
Through the adaptation of a cost function, based on the Ising model, for social 
interactions between individuals, they showed by numerical simulation that a 
steady-state is obtained through natural dynamics of social systems. 


18 Alireza Farasat et al. 





Exponential Random Graph Models (ERGMs) [154], also known as 
the p*-class models, are among the most widely-used network approaches to 
modeling social networks in recent years [47,113,120,121,134]. A social network 
of individuals is denoted by graph G, with N nodes and M edges, M < ( ). 
The corresponding adjacency matrix of is denoted by Y = [yi;]wxn, where yj; 
is a random variable and defined as follows: 


_ _ J 1 if there exists a link between nodes i and j Vi, 7,1 AJ 
¥5 =) 0 otherwise. 


Based on an ERGM, the probability of an observed network, 2, is: 


1 K 
PLY = y,0) = 3 exp (>: a) 7 (3) 


where fi(y),i = 1,...,K, are called sufficient statistics [98, 102], based on 
configurations of the observed graph and 0 = {61,...,9«} is a K-vector of 
parameters (fC is the number of statistics used in the model). Network con- 
figurations, including but not limited to network edge count (tie between two 
actors), as well as counts of 2-stars (two ties sharing an actor) and triads 
of various types, are related to communication patterns among actors in a 
social network (see [98] for more details about network configurations). The 
parameters of ERGMs reflect a wide variety of possible configurations in social 
networks [119]. In addition, Z is the normalization constant. 


Some of the first proposed models, e.g., random graphs and p; models [47], 
used Bernoulli and dyadic dependence structures, which are generally overly- 
simplistic [120]. On the contrary, ERGMs are based on Markov dependence 
assumption [47] supposing that two possible ties are conditionally dependent 
when they share an actor (node). Moreover, Markov dependence assumption 
can be extended to attributed networks which assumes each node has a set 
of attributes influencing the node’s possible incoming and outgoing ties [120] 
(e.g., more experienced actors in an advisory network, more incoming ties). 
When nodal attributes are taken into account as random variables, ERGMs 
and MNs can be integrated to model the social network due to similarities 
that they share (see the Appendix and [45, 98, 144]). 


ERGMs have been widely employed to study the network and friendship 
formation [135] and global network structural using local structure of the ob- 
served network [146]. The observed network is considered as one realization 
from too many possible networks with similar important characteristics [120]. 
For example, Broekel et al. used ERGMs to identify factors determining the 
structure of inter-organizational networks based on the single observation [15]. 
Schaefer et al. used SNA to study the relation between weight status and friend 
selection and ERGMs to measure the effects of body mass index on friend se- 
lection [128]. 
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Morover, Goodreau et al. used ERGMs to examine the generative processes 
that give rise to widespread patterns in friendship networks [59]. Cranmer 
and Desmarais used ERGMs to model co-sponsorship networks in the U.S. 
Congress and conflict networks in the international system. They figured out 
that several previously unexplored network parameters are acceptable pre- 
dictors of the U.S. House of Representatives legislative co-sponsorship net- 
work [32]. 


The ERGMs have also been utilized in modelling the changing commu- 
nication network structure and classifying networks based on the occurrence 
of their local features [146] and to identify micro-level structural properties 
of physician collaboration network on hospitalisation cost and readmission 
rate [147]. Finally, a ERGM-based model of clustering nodes considering their 
role in the network has been reported [126]. 


5 Discussion 


Mining social networks for knowledge and discovery has proven to be a very 
challenging and active research area [79]. This review focussed on PGMs, and 
motivated their use in social networks through a variety of diverse applica- 
tions. An important consideration is the issue of scalability, which is a major 
challenge, not only for PGMs, but for SNA, in general. Structural and pa- 
rameter learning in high-dimensions can be prohibitive. In practice, several 
different network structures may be plausible, and equally likely. Moreover, 
both greedy- and sampling-based search strategies can get stuck at local min- 
ima. These numerical caveats can give rise to misleading networks, generat- 
ing models, and subsequent predictions. ERGMs can exhibit degeneracy [64], 
which occurs when the generated networks show little resemblance to the gen- 
erating model. Proposed modifications to the concept of goodness of fit have 
been proposed to safeguard against the problems of degeneracy [58, 71]. 


Social networks continuously evolve over time. The methods we have dis- 
cussed either utilize a static snapshot of the social network at a given time, 
or a fixed template structure which captures the dynamics. Template-based 
dynamics have proven their utility in a few social network applications. How- 
ever, they are overly simplistic in their assumptions. More realistically, social 
networks can give rise to several interrelated streams that contain complex 
overlapping relational data [83]. Moreover, communities drift as new members 
join, old members leave or becoming inactive, and activities change. PGMs 
are not equipped to model temporal models of this type. Data stream mining 
research is an active area of research that aims to analyze web data as a stream 
and upon arrival [86]. There are considerable challenges related not only to the 
sheer volume and speed in which data is processed, but also to the changes 
in the features or targets being processed. Another major challenge, which 
has been extensively studied, is the concept of drift [51]. This phenomena 
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occurs when the probability of features and targets change in time, in other 
words, probability distributions change in the stream. Estimation in posterior 
probabilities in DBNs is spirit to drift estimation, but much more severely 
constrained due to the Markov assumption. 


Alternative methods to modeling dynamics of the network have been pro- 
posed, including latent modeling approaches and the adoption of smooth tran- 
sition assumptions. Sarkar et al. proposed a latent space model which assumes 
smooth transitions between time-steps, i.e. networks that change drastically 
from one time step to the next are assigned a lower probability. They also adopt 
a standard Markov assumption which states that t+1 is conditionally indepen- 
dent of all previous time-steps given t, which is the assumption adopted in our 
discussion of DBNs. Hoff et al. describe a latent space approach that relies on 
mapping actors into a social space by leveraging assumed transitive tendencies 
in relationships in order to estimate proximity in the latent space [69]. The 
iterative Facetnet algorithm frames the dynamic problem in terms of a non- 
negative matrix factorization, and uses the Kubler-Leibler divergence measure 
to enforce temporal smoothness [95]. TESLA extends the well-known graphi- 
cal LASSO method for sparse regression, and penalizes changes between time 
steps using 1,-regularization. [4]. The TESLA algorithm was tested on both 
biological and social networks. 


In this review, we survey directed and undirected PGMs, and highlight 
their applications in modern social networks. Despite limitations that arise re- 
lated to scalability and inference, it is our opinion that the utility of PGMs has 
been somewhat under-realized in the social network arena. It is indisputable 
that methods for understanding social networks have not kept pace with the 
data explosion. There are several relevant topics and opportunities in social 
networks, e.g., link predication, collective classification, modeling information 
diffusion, entity resolution, and viral marketing, where conditional indepen- 
dencies can be leveraged to improve performance. PGMs implicitly convey 
conditional independence and provide flexible modeling paradigms, which hold 
tremendous promise and untapped opportunity for SNA. 
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7 Appendix 


Similarity between MNs and ERGMs. 

While MNs and ERGMs have been developed in different scientific domains, 
they both specify exponential family distributions. MN models treat social 
network nodes as random variables, and hence, their utility is most obvi- 
ous in modeling processes on networks; ERGMs, on the other hand, have 
been conceptualized to model network formation, where it is the edge pres- 
ence indicators that are treated as random variables (these random variables 
are dependent if their corresponding edges share a node). But in fact, this 
application-related difference in what to treat as random is not fundamental. 
This Appendix works to more rigorously disclose the similarity between MNs 
and ERGMs by re-defining an ERGM as a PGM. We begin, however, by re- 
viewing the branch of literature devoted exclusively to ERGMs. 
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Similar to MNs, a well-discussed problem of ERGMs for analyzing social 
networks is related to the challenge of parameters estimation [122] due to the 
lack of enough observed data. Robins et al. outline this and some other prob- 
lems associated with ERGMs, e.g., degeneracy in model selection and bimodal 
distribution shapes [122] (see also [62, 64, 118, 134]). 


The roots of ERGMs in the Principle of Maximum Entropy [112] and the 
Hammersley-Clifford theorem have been previously pointed out [56,119]. Here, 
we illustrate how MNs and ERGMs are similar in terms of the form and struc- 
ture using most popular significant statistics in ERGMs; under the assumption 
of Markov dependence, for a given social network, one can build a correspond- 
ing Markov network via the following conversion: 1) each node in the Markov 
network will correspond to an edge in the social network (Fienberg called this 
construct a “usual graphical model” for ERGMs [46]), 2) when two edges share 
a node in the social network, a link will be built between two corresponding 
nodes in the Markov network. 


Corresponding to each possible edge in a social network, a node in an 
MN network is introduced; note the difference between the original social net- 
work and the MN network - they are not the same! Consider an ERGM with 
the significant statistics including the number of edges, fi(y), the number 
of k-stars, fi(y) i = 2,...,N —1 and the number of triangles, fy(y). In 
an MN, a maximum Entropy (maxent) model proposes the following form 
for the internal energy of the system, E,(%) = — 0; Qcigci. Define, ge; as 
i*” feature of clique c € Q and a,; is its corresponding weight in G. Thus, 
Ue(a) = exp{h. oan QciJei}. Since there are too many parameters in the 
MN, they can be deducted by imposing homogeneity constraints similar to 
that of ERGMs [120]. Before imposing such constraints, these following facts 
are required. 


It is straightforward to demonstrate that G encompasses cliques of size 
{3,..., N—1}. In addition, all substructure in G, can be redefined by features 


in G. Considering these points, we can rewrite the joint probability of all 
variables represented by the MN, P(X), as follows: 


1 Cc N 1 Cc N 
P(X) = Za) Iles (» Yau) = Za) exp (>: Bod, ona.) . (4) 


In (4), Z(a@) is the partition function which is a function of parameters. The 
homogeneity assumption, here, means a,; = 0; Vc =1,...,C; then P(X) is: 


N Cc 
P(X) = am exp (>: 6 S- Ba) 7 (5) 


In (5), let’s Z’ = Z(6’). In addition, we assume that pear BeGes Tepresented 
by f/, means that substructures 7 in all cliques c are added up by weight (.. 
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Fig. 5 A social network with five actors(left) and its corresponding Markov network (right). 


Finally, if we replace f/ in (5): 


1 N 
P(X) = Fexp (>. a) ; (6) 


Comparing P(Y = y) and (4) confirms that ERGMs and MNs are similar and 
under the following conditions they are identical: 

1) 6, = %, 

2) fi = Fi = Der Bose 


The following Numerical Example depicts similarities between ERGMs and 
MNs. A social network with five actors, N = 5, is assumed (Figure 5 (left)). 
Considering Markov dependency assumption, there exists an unique corre- 
sponding Markov network shown in Figure 5 (right) with 10 nodes. There are 
15 cliques (so-called factors) of size three or four, 


® = {1(y12, ya, Yds Y15), +++» P15(Y24, Y45, Yos) F- 
As already mentioned, the joint probability function of all variables in each 


clique is proportional to the internal energy. For instance: 


1 
di(x) = y exp{—(1 Ec(yi2, y13, yi4, Ys) }; 


where Ei (x) = — 5°, cige; and X is the distribution parameter. This simple 
example shows that how ERGMs and MNs are the same in terms of the un- 
derlying concept and the expressed probability distribution. 


